Search CORE

397 research outputs found

A Survey on Automatic Parameter Tuning for Big Data Processing Systems

Author: Chen Yuxing
Herodotou Herodotos
Lu Jiaheng
Publication venue
Publication date: 01/04/2020
Field of study

Big data processing systems (e.g., Hadoop, Spark, Storm) contain a vast number of configuration parameters controlling parallelism, I/O behavior, memory settings, and compression. Improper parameter settings can cause significant performance degradation and stability issues. However, regular users and even expert administrators grapple with understanding and tuning them to achieve good performance. We investigate existing approaches on parameter tuning for both batch and stream data processing systems and classify them into six categories: rule-based, cost modeling, simulation-based, experiment-driven, machine learning, and adaptive tuning. We summarize the pros and cons of each approach and raise some open research problems for automatic parameter tuning.Peer reviewe

Ktisis

Helsingin yliopiston digitaalinen arkisto

DialectGram: Detecting Dialectal Variation at Multiple Geographic Resolutions

Author: Chen Yuxing
Hong Haoshen
Jiang Hang
Kulkarni Vivek
Publication venue: ScholarWorks@UMass Amherst
Publication date: 15/10/2019
Field of study

Several computational models have been developed to detect and analyze dialect variation in recent years. Most of these models assume a predefined set of geographical regions over which they detect and analyze dialectal variation. However, dialect variation occurs at multiple levels of geographic resolution ranging from cities within a state, states within a country, and between countries across continents. In this work, we propose a model that enables detection of dialectal variation at multiple levels of geographic resolution obviating the need for a-priori definition of the resolution level. Our method DialectGram, learns dialect-sensitive word embeddings while being agnostic of the geographic resolution. Specifically it only requires one-time training and enables analysis of dialectal variation at a chosen resolution post-hoc -- a significant departure from prior models which need to be re-trained whenever the pre-defined set of regions changes. Furthermore, DialectGram explicitly models senses thus enabling one to estimate the proportion of each sense usage in any given region. Finally, we quantitatively evaluate our model against other baselines on a new evaluation dataset DialectSim (in English) and show that DialectGram can effectively model linguistic variation

arXiv.org e-Print Archive

ScholarWorks@UMass Amherst

Automatic Performance Tuning for Distributed Data Stream Processing Systems

Author: Chen Yuxing
Herodotou Herodotos
Lu Jiaheng
Odysseos Lambros
Publication venue: IEEE
Publication date: 09/05/2022
Field of study

Peer reviewe

Ktisis

Helsingin yliopiston digitaalinen arkisto

Judging a video by its bitstream cover

Author: Ding Yunan
Gan Chen Ye
Han Yuxing
Wen Jiangtao
Publication venue
Publication date: 13/09/2023
Field of study

Classifying videos into distinct categories, such as Sport and Music Video, is crucial for multimedia understanding and retrieval, especially in an age where an immense volume of video content is constantly being generated. Traditional methods require video decompression to extract pixel-level features like color, texture, and motion, thereby increasing computational and storage demands. Moreover, these methods often suffer from performance degradation in low-quality videos. We present a novel approach that examines only the post-compression bitstream of a video to perform classification, eliminating the need for bitstream. We validate our approach using a custom-built data set comprising over 29,000 YouTube video clips, totaling 6,000 hours and spanning 11 distinct categories. Our preliminary evaluations indicate precision, accuracy, and recall rates well over 80%. The algorithm operates approximately 15,000 times faster than real-time for 30fps videos, outperforming traditional Dynamic Time Warping (DTW) algorithm by six orders of magnitude

arXiv.org e-Print Archive